April 2021

Basics of spatial data

Spatial data can come in 2 formats: vector or raster. Vector data is comprised of vertices and paths that connect to form points, lines, and polygons. “Shapefiles” are one of the most common vector formats you see in spatial data. Rasters, on the other hand, are made of a matrix of grid cells or pixels. For simple maps, you’ll most often work with shapefiles (e.g. sampling points, state lines, pond outlines).

Map projections

To view the Earth on our screen, we need to figure out how to represent a 3-D object on a 2-D plane. This can be accomplished by projecting the Earth onto a flat surface. Map projections necessarily involve the loss of some data (area, shape, direction, distance, scale)

Mercator projection: preserves direction, but distorts size

Gall-Peters projection: preserves size, but distorts distance

Map projections, cont.

Important note! The default projection in R is to use a rectangular projection with the aspect ratio chosen so that longitude and latitude scales are equivalent at the center of the picture. Thus, when working with data sources without defined projections, you may have to define the projection. I’ll show an example of this later on.

Some key spatial packages:

# load relevant packages

library(dplyr)
library(stringr)
library(ggplot2)
library(ggmap)
library(RColorBrewer)
library(maps) # basic maps: continents, countries, states, and counties
library(mapdata) # higher resolution maps
library(rgdal) # reading, writing, converting between spatial data in dozens of formats
library(maptools) # combinig spatial data
library(mapproj) # projecting spatial data
library(sp) # basic functions for spatial data
library(raster) # working with raster data 
library(grDevices) # 

Making basic maps

Maps can be created in base R or ggplot. The map function uses map databases to create maps from vector data. Many arguments in the map function are similar to plots in base R. Different databases can have different names, extents, resolution, etc. There’s no ‘Russia’ map in the worldHires database, but there is a USSR.

# Base maps 
map(database = "world", regions = "Russia", col = "lightblue", fill = TRUE,
    xlim=c(100,190), ylim=c(50,75))

Making basic maps, cont.

You can select multiple regions, colors, and can layer maps from multiple databases. You can also add elements with points at any latitude and longitude. Note that y is latitude and x is longitude in maps – don’t get these mixed up!

map(database = "state", regions = c("Florida","Georgia","Alabama"), 
    col = c("pink", "darkorchid", "coral"), fill = TRUE, border = NA)
map("county", regions = c("Florida","Georgia","Alabama"), add = T, col = "white")
disney <- c(28.3852, -81.5639) # latitude/longitude points for Disney world
points(x = disney[2], y = disney[1], pch="*", cex=4, col = "yellow") 

# note the switching of lat/long to work with x/y 

Maps with raster data

R can also call raster data from many databases. Beware: these can take a long time to download and can cause R to crash – save your workspace before loading!

# future climate data -- note temps are degrees C * 10
climate_change <- raster:::getData('CMIP5', var='tmax', res=10, rcp=85, model='AC', year=70)
plot(climate_change$ac85tx701)

Making maps in ggplot

Use map_data to turn map data fom the maps package into data frame that ggplot can read. A couple quick fixes make this map a little nicer: coord_map() gives a localized aspect ratio, low and high change the color gradient, and trans puts population on a log scale.

states <- map_data("state") 
x <- read.csv("Maps_presentation_files/state_pop.csv", stringsAsFactors = FALSE)
us_pop <- inner_join(states, x, by = "region")
us_pop$Population<- as.numeric(us_pop$Population)

ggplot(data = us_pop) +
  geom_polygon(aes(x = long, y = lat, fill = Population, group = group), color = "white") + scale_fill_gradient(low = "red", high = "blue", trans = "log10") + 
    coord_map() + theme_bw() + theme(plot.margin=unit(c(-0.30,0,0,0), "null"))

Animated maps: time series

Let’s make a map!

I need to create a study area map for my manuscript. I want to show my species localities and all IL streams. I also want to code each watershed within my map in a different color. I have a .csv file with lat/long data for my species localities, as well as a shapefile for IL streams (USGS Watershed Boundary Dataset). I also want to add an outline of the state of IL.

localities <- read.csv("Maps_presentation_files/illinois_localities.csv")
head(localities)
##   Catalogue Family__            Genus_spec Day_    Month_ Year_ F__Vouchered
## 1     42702       22 Cyprinella spiloptera    7   October  1997          284
## 2     44780       22 Cyprinella spiloptera   24 September  1997           68
## 3     44805       22 Cyprinella spiloptera   17 September  1997           25
## 4     47221       22 Cyprinella spiloptera    8      June  1998           90
## 5     47530       22 Cyprinella spiloptera   22      July  1998          102
## 6     47687       22 Cyprinella spiloptera   25      June  1998           57
##   F__Collected species_pe                   Remarks Latitude Longitude
## 1          284         33                           40.03838 -87.55764
## 2           68         26 boat electroshocking, 1+2 39.25108 -88.17355
## 3           25         17    33 min. electric seine 41.67160 -88.44269
## 4           90         19                           42.48222 -89.24772
## 5          102         15                           39.46521 -88.23019
## 6           57         16                           41.84560 -89.48353
##   F__specimens
## 1           15
## 2           15
## 3           12
## 4           15
## 5           14
## 6           10

Let’s make a map, cont.

First, let’s check out our streams file using the readOGR function to read in spatial data. We can see from the summary that this is a polygon shapefile that is projected in the UTM NAD83 zone 16 projection (more on that later). We can also see that there is an attribute for HUC4; this defines the code for each watershed in the state of Illinois.

## OGR data source with driver: ESRI Shapefile 
## Source: "/Users/kbja10/Github/making_maps_in_R_updated/Maps_presentation_files/Stream_HUC4/stream_HUC4.shp", layer: "stream_HUC4"
## with 13 features
## It has 3 fields
## Integer64 fields read as strings:  ID GRIDCODE
## Object of class SpatialPolygonsDataFrame
## Coordinates:
##         min       max
## x  148717.9  468731.4
## y 4183300.7 4724312.1
## Is projected: TRUE 
## proj4string :
## [+proj=utm +zone=16 +datum=NAD83 +units=m +no_defs]
## Data attributes:
##       ID              GRIDCODE              HUC4    
##  Length:13          Length:13          Min.   :512  
##  Class :character   Class :character   1st Qu.:709  
##  Mode  :character   Mode  :character   Median :712  
##                                        Mean   :696  
##                                        3rd Qu.:712  
##                                        Max.   :714

Next, let’s get a map of IL from the maps package

illinois <- map("state", regions = "Illinois", fill = TRUE)

Looks great! Now, let’s put our streams on the map:

map("state", regions = "Illinois")
plot(watersheds, add = TRUE)

What happened?

We just saw that our streams and our state plot out separately, but are not lining up when plotted together.

  • Watersheds are defined by the UTM NAD83 projected coordinate system
  • Metadata for the US Census Bureau (where the data from the ‘state’ database comes from) is in the NAD83 geographic coordinate system
  • Units in a geographic coordinate system are decimal degrees (e.g. lat/longs), while the units in a projected coordinate system are feet/meters.
summary(illinois)
##       Length Class  Mode     
## x     329    -none- numeric  
## y     329    -none- numeric  
## range   4    -none- numeric  
## names   1    -none- character
projection(illinois)
## [1] NA

Fixing the problem

So, what I need to do is turn the Illinois map object into a spatial polygon object and define the projection using map2SpatialPolygons. Then I can project it into the same projected coordinate system as my watersheds data with spTransform.

illinois <- map2SpatialPolygons(illinois, IDs=illinois$names, 
                                proj4string=CRS("+proj=longlat +datum=NAD83"))
summary(illinois)
## Object of class SpatialPolygons
## Coordinates:
##         min       max
## x -91.50136 -87.49638
## y  37.00161  42.50774
## Is projected: FALSE 
## proj4string : [+proj=longlat +datum=NAD83 +no_defs]
projection(illinois)
## [1] "+proj=longlat +datum=NAD83 +no_defs"

Fixing the problem, cont.

Now our Illinois shapefile projection is defined. However, note the x and y are in decimal degrees. Contrast with the summary of our watersheds shapefile:

summary(watersheds)
## Object of class SpatialPolygonsDataFrame
## Coordinates:
##         min       max
## x  148717.9  468731.4
## y 4183300.7 4724312.1
## Is projected: TRUE 
## proj4string :
## [+proj=utm +zone=16 +datum=NAD83 +units=m +no_defs]
## Data attributes:
##       ID              GRIDCODE              HUC4    
##  Length:13          Length:13          Min.   :512  
##  Class :character   Class :character   1st Qu.:709  
##  Mode  :character   Mode  :character   Median :712  
##                                        Mean   :696  
##                                        3rd Qu.:712  
##                                        Max.   :714

Fixing the problem, cont.

To get the Illinois map into a projected coordinate system, let’s first create an object to store the projection information from the watersheds file and then project Illinois into a projected coordinate system:

projection(watersheds)
## [1] "+proj=utm +zone=16 +datum=NAD83 +units=m +no_defs"
UTMnad83<- CRS("+proj=utm +zone=16 +datum=NAD83 +units=m +no_defs")
illinois_proj<- spTransform(illinois, UTMnad83) # transform to UTMnad83
summary(illinois_proj)
## Object of class SpatialPolygons
## Coordinates:
##       min       max
## x  116589  456858.9
## y 4097263 4712603.0
## Is projected: TRUE 
## proj4string :
## [+proj=utm +zone=16 +datum=NAD83 +units=m +no_defs]

Moment of truth

Now, the x and y for the illinois shapefile are in meters, just like our watersheds file. Plotting them together now results in:

plot(illinois_proj)
plot(watersheds, add = T)

Adding species localities

I have a .csv file with lat/longs for the location of each species collection. Recall that lat/longs are in a geographic coordinate system, and we need to project them into a projected coordinate system so they line up with our watersheds. First, we’ll create a spatial object with ‘coordinates’ function

coordinates(localities)<- c("Longitude", "Latitude") # x=long, y=lat !!
summary(localities)
## Object of class SpatialPointsDataFrame
## Coordinates:
##                 min       max
## Longitude -90.75708 -87.52703
## Latitude   38.02549  42.48222
## Is projected: NA 
## proj4string : [NA]
## Number of points: 86
## Data attributes:
##    Catalogue         Family__   Genus_spec             Day_      
##  Min.   : 28218   Min.   :22   Length:86          Min.   : 1.00  
##  1st Qu.: 45273   1st Qu.:22   Class :character   1st Qu.:13.00  
##  Median : 58658   Median :22   Mode  :character   Median :18.00  
##  Mean   : 66608   Mean   :22                      Mean   :17.72  
##  3rd Qu.: 95917   3rd Qu.:22                      3rd Qu.:24.00  
##  Max.   :103908   Max.   :22                      Max.   :31.00  
##     Month_              Year_       F__Vouchered     F__Collected    
##  Length:86          Min.   :1984   Min.   :  8.00   Min.   :   8.00  
##  Class :character   1st Qu.:1990   1st Qu.: 15.25   1st Qu.:  18.25  
##  Mode  :character   Median :1997   Median : 25.50   Median :  30.50  
##                     Mean   :1996   Mean   : 47.05   Mean   :  71.83  
##                     3rd Qu.:1999   3rd Qu.: 55.75   3rd Qu.:  63.00  
##                     Max.   :2008   Max.   :479.00   Max.   :1354.00  
##    species_pe      Remarks           F__specimens  
##  Min.   : 1.00   Length:86          Min.   : 7.00  
##  1st Qu.:14.00   Class :character   1st Qu.:11.00  
##  Median :18.00   Mode  :character   Median :14.00  
##  Mean   :17.58                      Mean   :12.57  
##  3rd Qu.:21.00                      3rd Qu.:15.00  
##  Max.   :37.00                      Max.   :15.00

Adding species localities, cont.

proj4string(localities) <- CRS("+proj=longlat +datum=NAD83") # define projection
localities_proj<- spTransform(localities, UTMnad83) # transform to UTMnad83
summary(localities_proj)
## Object of class SpatialPointsDataFrame
## Coordinates:
##                 min       max
## Longitude  178696.1  454246.2
## Latitude  4209127.0 4705766.8
## Is projected: TRUE 
## proj4string :
## [+proj=utm +zone=16 +datum=NAD83 +units=m +no_defs]
## Number of points: 86
## Data attributes:
##    Catalogue         Family__   Genus_spec             Day_      
##  Min.   : 28218   Min.   :22   Length:86          Min.   : 1.00  
##  1st Qu.: 45273   1st Qu.:22   Class :character   1st Qu.:13.00  
##  Median : 58658   Median :22   Mode  :character   Median :18.00  
##  Mean   : 66608   Mean   :22                      Mean   :17.72  
##  3rd Qu.: 95917   3rd Qu.:22                      3rd Qu.:24.00  
##  Max.   :103908   Max.   :22                      Max.   :31.00  
##     Month_              Year_       F__Vouchered     F__Collected    
##  Length:86          Min.   :1984   Min.   :  8.00   Min.   :   8.00  
##  Class :character   1st Qu.:1990   1st Qu.: 15.25   1st Qu.:  18.25  
##  Mode  :character   Median :1997   Median : 25.50   Median :  30.50  
##                     Mean   :1996   Mean   : 47.05   Mean   :  71.83  
##                     3rd Qu.:1999   3rd Qu.: 55.75   3rd Qu.:  63.00  
##                     Max.   :2008   Max.   :479.00   Max.   :1354.00  
##    species_pe      Remarks           F__specimens  
##  Min.   : 1.00   Length:86          Min.   : 7.00  
##  1st Qu.:14.00   Class :character   1st Qu.:11.00  
##  Median :18.00   Mode  :character   Median :14.00  
##  Mean   :17.58                      Mean   :12.57  
##  3rd Qu.:21.00                      3rd Qu.:15.00  
##  Max.   :37.00                      Max.   :15.00

The final product

Now we have everything in order, and we can plot up our spatial objects! Remember that I need to shade my unique watersheds and individual species in unique colors. This is made easy with palettes in RColorBrewer.

water_pal<- brewer.pal(n = 5, name = "Blues") # for my streams 
species_pal<- brewer.pal(n = 6, name = "Dark2")

plot(illinois_proj, col="gray24") # plot projected illinois shapefile
water_col <- 1
for (i in unique(watersheds$HUC4)){
  plot(watersheds[watersheds$HUC4 == i, ], add=TRUE, 
       border = water_pal[water_col], col = water_pal[water_col]) # plot each watershed in a different color
  water_col<- water_col+1 
}

species_col <- 1 
for (i in unique(localities_proj$Genus_spec)){
  plot(localities_proj[localities_proj$Genus_spec == i, ], add=TRUE, 
       pch = 21, bg = species_pal[species_col], col = "black", cex = 1.3)
  species_col<- species_col+1
}

legend("topright", unique(localities_proj$Genus_spec), pch=21, pt.bg=species_pal, col="black", bty="n", text.font=3)

The final product

Another example in ggplot

Let’s make another sampling map of sampling locations around a lake using different sampling gears. We have lat/longs of our sampling locations and have downloaded shapefiles of U.S. lakes from ArcGIS. Our map will also include a scale bar and an inset map of the entire U.S.

all_sampling_points <- read.csv("Maps_presentation_files/all_sampling_points.csv")
ef_dat <- read.csv("Maps_presentation_files/EF spring std site locations2.csv")
lakes <- readOGR("Maps_presentation_files/ne_10m_lakes") # file path to lake shapefile
## OGR data source with driver: ESRI Shapefile 
## Source: "/Users/kbja10/Github/making_maps_in_R_updated/Maps_presentation_files/ne_10m_lakes", layer: "ne_10m_lakes"
## with 1354 features
## It has 37 fields
## Integer64 fields read as strings:  scalerank ne_id
lakes <- lakes[complete.cases(lakes$name), ]
great_lakes_map <- subset(lakes, name==c("Lake Ontario","Lake Erie"))
oneida_map <- subset(lakes, name=="Oneida Lake")

Oneida map

First, we’ll make the main sampling map.

mapbox <- c(-76.14, 43.14, -75.7, 43.3) # specify map boundaries
oneida_lake <- get_map(location = mapbox, source = "stamen", 
                       maptype = "terrain", zoom = 13)
# Could pull up google maps if you have a license 
cols <- c(brewer.pal(8, "Dark2"),"#386CB0")

oneida_lake_map <- ggmap(oneida_lake) +
  geom_point(data=all_sampling_points, 
             mapping=aes(x=long, y=lat, color=gear, shape=gear), 
             size=3, stroke = 1.2) +
  scale_color_manual(values=cols[c(4,9,6,3,1)]) +
  scale_shape_manual(values=c(17,5,3,16,6)) +
  geom_line(data=ef_dat, aes(x=long, y=lat, group=Description), 
            color="white", size=0.5) +
  theme(axis.text.x = element_blank(), 
        axis.text.y = element_blank(), axis.ticks = element_blank(),
        rect = element_blank(), axis.title.y=element_blank(),
        axis.title.x=element_blank()) +
  ggsn::scalebar(location = "bottomleft", x.min = -76.13, x.max = -75.7, 
           y.min = 43.145, y.max = 43.25, 
           dist = 5, dist_unit="km", transform = TRUE, 
           model = "WGS84", height = 0.05, 
           st.dist = 0.05, st.bottom=FALSE)

Oneida map, cont.

library(cowplot)
region_map <- ggplot() + 
  geom_polygon(data=map_data("state"), 
               aes(x=long, y=lat,group=group), color="white") + 
  geom_polygon(data=great_lakes_map, 
               aes(x=long, y=lat, group=group), 
               color="lightblue", fill="lightblue") + 
  geom_polygon(data=oneida_map, aes(x=long, y=lat, group=group), 
               color="lightblue", fill="lightblue") + 
  coord_cartesian(xlim=c(-79, -70), ylim = c(40.05, 45.5)) +
  geom_rect(aes(xmin=-76.2, xmax=-75.67, ymin=43.05, ymax=43.4),
            fill = "transparent", color = "red", size = 0.5) +
  theme_map() + theme(panel.background = element_rect(fill = "white"))
region_map

Final map

inset_map <- ggdraw() + draw_plot(oneida_lake_map) +
  draw_plot(region_map, x = 0.62, y = 0.6, width = 0.26, height = 0.26)
inset_map